Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for versioning with DVC in Kedro #4443

Merged
merged 32 commits into from
Feb 12, 2025

Conversation

lrcouto
Copy link
Contributor

@lrcouto lrcouto commented Jan 27, 2025

Description

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

lrcouto and others added 5 commits January 27, 2025 10:46
Signed-off-by: Laura Couto <[email protected]>
Signed-off-by: Laura Couto <[email protected]>
Signed-off-by: Laura Couto <[email protected]>
Signed-off-by: Laura Couto <[email protected]>
@astrojuanlu
Copy link
Member

@lrcouto lrcouto marked this pull request as ready for review January 29, 2025 14:02
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to follow the instructions but found some blockers, so I didn't review all of the document

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being a pain in the neck 🙏🏼

I think we need to decide who's the audience of this guide.

  • Is it users that don't know much about Kedro or DVC?
  • Or is it Kedro users that already know their way around more or less?

In the first case, I think this should almost read like a tutorial and the commands and the steps should be fool-proof. For example, instead of clarifying about the .gitignore after dvc add, we should add a step before that reads like

Since the spaceflights-pandas starter ignores everything under data/ by default, you have to update the .gitignore file as follows:

Otherwise, if we assume that the reader knows a bit about Kedro, we can make the guide less prescriptive. In that case, then the sentence from the beginning "For this example, we will be using a Kedro spaceflights-pandas starter project" should be modified, maybe like

For example, you can use the spaceflights-pandas starter, see the documentation about starters [link]


As an example of this, I've been always inspired by the work @/melissawm did on the NumPy tutorials, for example

We don't have to follow the same structure but at least having a clear reader persona in mind helps structure the rest of the document 💡

@lrcouto
Copy link
Contributor Author

lrcouto commented Feb 4, 2025

Sorry for being a pain in the neck 🙏🏼

I think we need to decide who's the audience of this guide.

* Is it users that don't know much about Kedro or DVC?

* Or is it Kedro users that already know their way around more or less?

In the first case, I think this should almost read like a tutorial and the commands and the steps should be fool-proof. For example, instead of clarifying about the .gitignore after dvc add, we should add a step before that reads like

Since the spaceflights-pandas starter ignores everything under data/ by default, you have to update the .gitignore file as follows:

Otherwise, if we assume that the reader knows a bit about Kedro, we can make the guide less prescriptive. In that case, then the sentence from the beginning "For this example, we will be using a Kedro spaceflights-pandas starter project" should be modified, maybe like

For example, you can use the spaceflights-pandas starter, see the documentation about starters [link]

As an example of this, I've been always inspired by the work @/melissawm did on the NumPy tutorials, for example

* https://numpy.org/numpy-tutorials/content/tutorial-svd.html

* https://numpy.org/numpy-tutorials/content/tutorial-ma.html

We don't have to follow the same structure but at least having a clear reader persona in mind helps structure the rest of the document 💡

These are some fair points. I think the step-by-step tutorial approach might be better, I usually don't like to make assumptions about an user's knowledge level.

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some more work needed

@astrojuanlu
Copy link
Member

RTD failure is unrelated

/home/docs/checkouts/readthedocs.org/user_builds/kedro/checkouts/4443/docs/source/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.md:155:broken link: https://towardsdatascience.com/five-software-engineering-principles-for-collaborative-data-science-ab26667a311 (404 Client Error: Not Found for url: https://towardsdatascience.com/five-software-engineering-principles-for-collaborative-data-science-ab26667a311)

(they took down our blog post? 🥹)

@ankatiyar
Copy link
Contributor

RTD failure is unrelated

/home/docs/checkouts/readthedocs.org/user_builds/kedro/checkouts/4443/docs/source/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.md:155:broken link: https://towardsdatascience.com/five-software-engineering-principles-for-collaborative-data-science-ab26667a311 (404 Client Error: Not Found for url: https://towardsdatascience.com/five-software-engineering-principles-for-collaborative-data-science-ab26667a311)

(they took down our blog post? 🥹)

It moved to medium

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lrcouto and @ankatiyar , almost there. Left a few more comments. Also please pay attention to the Vale warnings, most of them are worth addressing.

@astrojuanlu
Copy link
Member

I resolved all review comments from previous passes that no longer apply, please have a look at the rest.

lrcouto and others added 3 commits February 12, 2025 00:28
@lrcouto
Copy link
Contributor Author

lrcouto commented Feb 12, 2025

RTD failure is unrelated

/home/docs/checkouts/readthedocs.org/user_builds/kedro/checkouts/4443/docs/source/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.md:155:broken link: https://towardsdatascience.com/five-software-engineering-principles-for-collaborative-data-science-ab26667a311 (404 Client Error: Not Found for url: https://towardsdatascience.com/five-software-engineering-principles-for-collaborative-data-science-ab26667a311)

(they took down our blog post? 🥹)

It moved to medium

What is the new link? The main branch is still using the broken one.

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @lrcouto ! 1 final comment: please replace the use of first person ("we") with second person ("you")

https://ddbeck.com/first-person-plural-is-questionable/

https://developers.google.com/style/person

@ankatiyar could you give this another review pass?

Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments, the doc looks great! Thanks @lrcouto! ⭐

@lrcouto lrcouto merged commit d16badf into main Feb 12, 2025
10 checks passed
@lrcouto lrcouto deleted the kedro-dvc-documentation branch February 12, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants